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1.  INTRODUCTION 


With  the  continuing  advances  in  computing  hardware  and  the  advent  of 
dedicated  parallel  processors,  it  is  possible  to  place  more  and  more  com¬ 
puting  power  in  small  devices.  This  increased  capability  is  particularly 
important  for  real-time  image  processing  applications  which  require  extre¬ 
mely  high  throughput.  Examples  of  such  application  areas  are  robotic 
vision,  medical  diagnosis  and  military  target  acquisition  and  tracking, 
such  as  SDI.  Current  algorithms  tend  to  be  ad  hoc  in  nature,  typically 
consisting  of  a  cascade  of  processors  which  in  some  cases  work  at  cross 
purposes.  For  example,  a  2D  filter  is  often  used  to  reduce  noise  with  high 
spatial  frequency  content.  However,  this  operation  tends  to  blur  edges 
which  must  be  detected  later  to  determine  object  boundaries.  A  more 
suitable  approach  would  be  to  combine  edge  detection  and  noise  reduction  in 
a  single  step,  if  possible.  Such  an  approach  would  have  the  potential  for 
increased  performance,  particularly  at  low  signal/noise  ratios,  where  false 
alarm  and  miss  rates  tend  to  increase  rapidly  in  current  systems. 

Future  image  processing  systems  should  be  able  to  solve  a  variety  of  image 
understanding  problems,  such  as: 

•  image  segmentation 

•  surface  reconstruction 

•  stereo  matching 

•  determining  structure  from  motion. 

The  field  of  computational  vision  is  dedicated  to  solving  these  types  of 

problems  and  has  been  developing  rapidly  over  the  past  fifteen  years. 


Within  the  last  four  years,  some  exciting  developments  have  occurred  which 
show  great  promise  in  providing  coordinated  solutions  to  these  problems 
utilizing  distributed  algorithms.  These  developments  are  based  on  uti¬ 
lizing  a  probabilistic  framework.  The  two-dimensional  image  is  modeled  as 
a  random  field  which  has  to  be  estimated  in  real  time  from  a  set  of  noisy 
ambiguous  measurements  from  multiple  sensors.  A  Bayesian  viewpoint  is 
adopted,  in  which  the  prior  knowledge  is  expressed  as  a  probability  distri¬ 
bution.  Using  a  probabilistic  description  of  the  observation  noise,  the 
posterior  distribution  of  the  random  field  can  be  computed.  These  models 
are  based  on  the  use  of  Markov  random  fields  and  the  Gibbs  distribution. 
Significantly,  these  assumptions  lead  to  distributed  algorithms  which  may 
be  implemented  on  parallel  processors.  There  are  several  other  important 
advantages  in  using  this  approach  (Morroquin,  Mitter  and  Poggio,  1986).  It 
is  possible  to  model  both  piecewise  continuous  surfaces  and  the  boundaries 
between  smooth  patches  (targets,  clouds,  objects,  e.g.).  It  provides  a 
general  framework  for  solving  all  of  the  problems  mentioned  above.  The 
parameters  that  appear  in  the  reconstruction  algorithms  have  a  precise  sta¬ 
tistical  interpretation  which  may  be  validated  on  physical  grounds. 

1. 1  Computational  Vision  Systems 

The  standard  definition  of  computational  vision  is  that  it  is  inverse 
optics.  The  direct  problem-the  problem  of  classical  optics-or  computer 
graphics-is  to  determine  the  images  of  three-dimensional  objects. 
Computational  vision  is  confronted  with  inverse  problems  of  recovering  sur¬ 
faces  from  images.  Much  information  is  lost  during  the  imaging  process 
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that  projects  a  three-dimensional  world  into  two-dimensional  arrays 
(images).  As  a  consequence,  vision  must  rely  on  natural  constraints,  that 
is,  general  assumptions  about  the  physical  world  to  derive  an  unambiguous 
output.  This  is  typical  of  many  inverse  problems  in  mathematics  and  phy¬ 
sics. 


The  first  part  of  vision  -  from  images  to  surfaces  -  has  been  called 
early  vision.  The  common  characteristics  of  most  early  vision  problems,  in 
a  sense  their  deep  structure,  can  be  formalized:  early  vision  problems  are 
ill-posed  in  the  sense  defined  by  Hadamard  (Poggio  and  Torre  (1984)).  A 
problem  is  well-posed  when  its  solution  (a)  exists,  (b)  is  unique  and  ( c) 
depends  continuously  on  the  initial  data.  Ill-posed  problems  fail  to 
satisfy  one  or  more  of  these  criteria. 

1.2  Standard  Regularization  in  Early  Vision 

The  main  idea  for  "solving"  ill-posed  problems  is  to  restrict  the  class 
of  admissible  solutions  suitable  a  priori  knowledge.  In  standard  regulari¬ 
zation  methods,  due  mainly  to  Tikhonov,  the  regularization  of  the  ill-posed 
problem  of  finding  z  from  the  data  y:  Az  =  y  requires  the  choice  of  norms 
||*||  and  of  a  stabilizing  functional  ||Pz||.  In  standard  regularization 
theory,  A  is  a  linear  operator,  the  norms  are  quadratic  and  P  is  linear.  A 
method  that  can  be  applied  is: 

Find  z  that  minimizes  | (Az  -  y||2  +  A||pz||2f  where  A  is  a  so-called 
regularization  parameter. 

In  this  method,  A  controls  the  compromise  between  the  degree  of  regu¬ 
larization  of  a  solution  and  its  closeness  to  the  data.  P  embeds  the  phy- 
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sical  constraints  of  the  problem.  It  can  be  shown  for  quadratic 
variational  principles  that  under  mild  conditions  the  solution  space  is 
convex  and  a  unique  solution  exists. 

Poggio  et  al  (1984,  1985)  show  that  several  problems  in  early  vision 
can  be  "solved"  by  standard  regularization  techniques.  Surface 
reconstruction,  optical  flow  at  each  point  in  the  image,  optical  flow 
along  contours,  color,  stereo  can  be  computed  by  using  standard  regulariza¬ 
tion  techniques.  Variational  principles  that  are  not  exactly  quadratic  but 
have  the  same  form  as  that  above  can  be  used  for  other  problems  in  early 
vision.  The  main  results  of  Tikhonov  can,  in  fact,  be  extended  to  some 
cases  in  which  the  operators  A  and  P  are  nonlinear,  provided  they  satisfy 
certain  conditions.  (Morozov,  1984.) 

1.3  Limitations  of  Standard  Regularization  Theory 

Standard  regularization  theory  with  linear  A  and  P  is  equivalent  to 
restricting  the  space  of  solution  to  generalized  splines,  whose  order 
depends  on  the  order  of  the  stabilizer  P.  This  means  that  in  some  cases 
the  solution  is  too  smooth,  and  cannot  be  faithful  in  locations  where 
discontinuities  are  present.  In  optical  flow,  surface  reconstruction  and 
stereo,  discontinuities  are  in  fact  not  only  present,  but  also  the  most 
critical  locations  for  subsequent  visual  information  processing.  Standard 
regularization  cannot  deal  well  with  another  critical  problem  of  vision, 
the  problem  of  fusing  information  from  different  early  vision  modules. 

Since  the  regularizing  principles  of  the  standard  theory  are  quadratic, 
they  lead  to  linear  Euler-Lagrange  equations.  The  output  of  different 
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modules  can  therefore  be  combined  only  in  a  linear  way.  Terzopoulous 
(1984;  see  also  Poggio  et  al.,  1985)  has  shown  how  standard  regularization 
techniques  can  be  used  in  the  presence  of  discontinuities  in  the  case  of 
surface  interpolation.  After  standard  regularization,  locations  where  the 
solution  f  originates  a  large  error  in  the  regularization  term,  are  iden¬ 
tified  (this  needs  setting  a  threshold  for  the  error  in  smoothness).  A 
second  regularization  step  is  then  performed  using  the  location  of  discon¬ 
tinuities  as  boundary  conditions. 

In  any  case,  one  would  like  a  more  comprehensive  and  coherent  theory 
capable  of  dealing  directly  with  the  problem  of  discontinuities  and  the 
problem  of  fusing  information.  So  the  challenge  for  a  regularization 
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theory  of  early  vision  is  to  extend  it  beyond  standard  regularization 


methods  and  their  most  obvious  non-linear  versions. 
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1.4  Stochastic  Approach  to  Regularizing  Early  Vision 

In  this  research,  we  have  developed  a  rigorous  approach  to  overcome  part 
of  the  ill-posedness  of  vision  problems,  based  on  Bayes  estimation  and 
Markov  Random  Field  models,  that  effectively  deals  with  the  problems  faced 
by  the  standard  regularization  approach.  In  this  approach,  the  a  priori 
knowledge  is  represented  in  terms  of  an  appropriate  probability  distribu¬ 
tion,  whereas  in  standard  regularization  a  priori  knowledge  leads  to 
restrictions  on  the  solution  space.  This  distribution,  together  with  a 
probabilistic  description  of  the  noise  that  corrupts  the  observations, 
allows  one  to  use  Bayes  theory  to  compute  the  posterior  distribution  Pf|g, 
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which  represents  the  likelihood  of  a  solution  f  given  the  observations  g. 

In  this  way,  we  can  solve  the  reconstruction  problem  by  finding  the  esti¬ 
mate  ?  which  either  maximizes  this  a  posteriori  probability  distribution 
(the  so  called  Maximum  a  Posteriori  or  MAP  estimate),  or  minimizes  the 
expected  value  (with  respect  to  Pf  |g)  of  an  appropriate  error  function. 

The  class  of  solutions  that  can  be  obtained  in  this  way  is  much  larger  than 
in  standard  regularization. 

The  price  to  be  paid  for  this  increased  flexibility  is  computational 
complexity.  New  parallel  architectures  and  possibly  hybrid  computers  of  the 
digital-analog  type  promise  however  to  deal  effectively  with  the  com¬ 
putational  requirements  of  the  methods  proposed  here. 

We  wish  to  emphasize  here  that  our  main  thrust  here  is  in  development  of 
distributed  algorithms  suitable  for  parallel  architectures,  and  on  compre¬ 
hensive  testing  on  image  data. 

1 . 5  Phase  I  Technical  Objectives 

Our  research  objectives  were  to: 

•  develop  new  distributed  algorithms  for  recovering  structure  from 
motion  and  for  discrimination  of  known  or  unknown  objects  from 
highly  cluttered  background. 

•  assess  feasibility  of  real-time  operation  using  state-of-the-art 
parallel  processors 

•  evaluate  performance  using  highly-cluttered  image  data  containing 


moving  targets 


2.  IMAGES  AS  FUNCTIONS  OF  MARKOV  RANDOM  FIELDS 


The  key  to  success  in  the  use  of  the  proposed  approach,  is  the  ability 
to  find  a  class  of  stochastic  models  (that  is,  random  fields)  that  have  the 
following  characteristics : 

(i)  The  probabilistic  dependencies  between  the  elements  of  the  field 
should  be  local.  This  condition  is  necessary  if  the  field  is  to 
be  used  to  model  surfaces  that  are  only  piecewise  smooth;  besi¬ 
des,  if  it  is  satisfied,  the  reconstruction  algorithms  are  likely 
to  be  distributed,  and  thus,  efficiently  implementable  in 
parallel  hardware. 

(ii)  The  class  should  be  rich  enough,  so  that  a  wide  variety  of  quali¬ 
tatively  different  behaviors  can  be  modeled. 

(iii)  The  relation  between  the  parameters  of  the  models  and  the  charac¬ 
teristics  of  the  corresponding  sample  fields  should  be  relatively 
transparent,  so  that  the  models  are  easy  to  specify. 

(iv)  It  should  be  possible  to  represent  the  prior  probability  distri¬ 
bution  Pf  explicitly,  so  that  Bayes  theory  can  be  applied. 

(v)  It  should  be  possible  to  specify  efficient  Monte  Carlo  proce¬ 

dures,  both  for  generating  sample  fields  from  the  distribution, 
so  that  the  capability  of  the  model  to  represent  our  prior 
knowledge  can  be  verified,  and  to  compute  the  optimal  estimators. 

A  class  of  random  fields  that  satisfies  these  requirements  is  the  class 
of  Markov  Random  Fields  (MRF's)  on  finite  lattices.  A  MRF  has  the  property 
that  the  probability  distribution  of  the  configurations  of  the  field  can 
always  be  expressed  in  the  form  of  a  Gibbs  distribution: 


where  Z  is  a  normalizing  constant,  Tq  is  a  parameter  (known  as  the  "natural 
temperature"  of  the  field)  and  the  "Energy  function"  E(f)  is  of  the  form: 

E(f)  =  Z  Vc( f ) 

C 

where  C  ranges  over  the  "cliques"  associated  with  the  neighborhood  system 
of  the  field,  and  the  potentials  V^( f )  are  functions  supported  on  them  (a 
clique  is  either  a  single  site,  or  a  set  of  sites  such  that  any  two  sites 
belonging  to  it  are  neighbors  of  each  other). 

We  will  assume  that  the  available  observations  g  are  obtained  from  a 
typical  realization  f  of  the  field  by  a  degrading  operation  (such  as 
sampling)  followed  by  corruption  with  i.i.d.  noise  (the  form  of  whose 
distribution  is  known) ,  so  that  the  conditional  distribution  can  be  written 
as : 


Pg|f(g;f)  =  ex  p  [  -  a  Z  <t»i(  f  ,gi)  ] 

ieS 

where  { <t> i }  are  some  known  functions,  and  a  is  a  parameter. 

distribution  is  obtained  from  Bayes  rule: 

The  posterior 

pf|g(f;g)  -  exp[~Ep(f ;g)] 

(2.2) 

EP(f;g)  -  ~~  E(f)  +  a  Z  *(f,gi) 

0  ieS 

(2.3) 

It  is  important  to  note  that  the  Markov  structure  is  retained  under 


conditioning  and  that  the  posterior  distribution  is  also  a  Gibbs 


Cost  Functionals 


The  Bayesian  approach  to  the  solution  of  reconstruction  problems  has 
been  adopted  by  several  researchers.  In  most  cases,  the  criterion  for 
selecting  the  optimal  estimate  has  been  the  maximization  of  the  posterior 
probability  (the  Maximum  aposteriori  or  MAP  estimate).  It  has  been  used, 
for  example,  by  Geman  and  Geman  (1984)  for  the  restoration  of  piecewise 
constant  images;  by  Grenander  (1984)  for  pattern  recognition,  and  by  Elliot 
et.  al.  (1983)  and  Hansen  and  Elliot  (1982)  for  the  segmentation  of  tex¬ 
tured  images  (a  similar  criterion  -  the  maximization  of  a  suitably  defined 
likelihood  function  -  has  been  used  by  Cohen  and  Cooper  (1984)  for  the  same 
purposes) . 

In  some  other  cases,  a  performance  criterion,  such  as  the  minimization 
of  the  mean  squared  error  has  been  implicitly  used  for  the  estimation  of 
particular  classes  of  fields.  For  example,  for  continuous-valued  fields 
with  exponential  autocorrelation  functions,  corrupted  by  additive  white 
Gaussian  noise,  Nahi  and  Assefi  (1972)  and  Habibi  (1972)  have  used  causal 
linear  models  and  optimal  (Kalman)  linear  filters  for  solving  the 
reconstruction  problem. 

Although  other  criteria  are  possible  (cf  Morroquin,  1985),  we  have 
chosen  the  MAP  criterion  here  for  designing  optimal  estimators  since  this 
criterion  gives  generally  equivalent  results,  with  the  exception  of  very 
high  noise  situations.  The  performance  of  estimators  using  other  criteria 
would  be  an  appropriate  topic  for  Phase  II  research. 


ESTIMATING  STRUCTURE  FROM  MOTION 


Since  the  primary  goal  of  image  processing  will  be  target  detection, 
discrimination  and  monitoring,  we  can  exploit  the  fact  that  the  target 
image  will  be  moving  relative  to  most  of  the  rest  of  the  image. 

A  common  approach  to  the  problem  is  the  use  of  flow  fields  (for 
example,  Ullman,  1981;  Bruss  and  Horn,  1981;  Williams,  1981,  Hildreth, 

1984;  Adiv,  1984).  These  approaches  generally  assume  deterministic  models 
or  utilize  standard  regularization  techniques.  We  have  already  alluded  to 
several  problems  encountered  when  using  these  approaches  and  the  potential 
advantages  of  a  statistically-based  approach  based  on  local  interaction 
models. 

Reed,  et  al  (1983)  have  developed  a  three— dimensional  matched  filtering 
approach  to  moving  target  detection.  However,  it  is  limited  to  point 
targets.  Legters  and  Young  (1982)  developed  an  operator— based  approach 
using  foreground  and  background  models  and  solved  a  least-squares  minimiza¬ 
tion  problem.  No  statistical  object  model  was  used.  Miller,  et  al  (1985) 
have  studied  the  general  moving  target  detection  problem,  restricted  to 
point  targets,  and  concluded  that  the  uniformly  most  powerful  detector, 
invariant  with  respect  to  image  intensity  variations,  consists  of  specific 
spatial-temporal  differencing  schemes.  In  the  sequal,  we  derive  similar 
conclusions  for  a  much  more  general  object  model. 

We  suggest  that  the  MRF  methodology  may  be  employed  to  recover  object 
motion  from  successive  images.  The  key  to  the  approach  lies  in 
appropriately  defining  the  potential  function. 
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3. 1  Optical  Flow 

Let  vx(x,y),  Vy(x,y)  be  the  components  of  the  velocity  vector  at  the 
point  (x,y)  on  the  image.  Then  vx  and  Vy  can  be  estimated  by  using  a  flow 
equation 


M(x,y,t)  =  Vj 


3f(x,y) 

5x 


9f(x,y) 

3y 


3f(x,y) 

Tt 


where  3f(x,y)/3x  and  3f(x,y)/3y  are  spatial  gradients  of  the  image  inten¬ 
sity  at  (x,y)  and  3f(x,y)/3t  is  found  by  time  differencing  successive  ima¬ 
ges.  All  terms  are  readily  estimated  by  using  numerical  differencing 
methods.  The  solution  to  this  equation  is  a  locus  of  points  along  a 
straight  line.  By  evaluating  solutions  around  a  neighborhood  of  (x,y),  one 
can  determine  estimates  for  vx(x,y)  and  Vy(x,y)  by  the  intersection  of  the 
individual  solutions  over  pixels  within  a  neighborhood. 


Flow  field  methods  may  not  be  optimal  for  the  SDI  problem,  however. 

The  critical  aspect  of  the  tracking  problem  we  are  addressing  is  the  abi¬ 
lity  to  handle  cluttered  background  and  foreground  noise  at  very  low 
signal/noise  ratios.  Methods  to  date  are  generally  based  on  moving  target 
indicator  (MTI)  technology  and  use  flow  field  analysis.  These  methods 
assume  smoothness  properties  which  do  not  hold  for  objects  covering  only  a 
few  pixels,  when  only  a  few  grey  levels  are  used,  or  for  the  case  when 
object  velocities  are  on  the  order  of  one  pixel/sample,  which  is  likely  to 
occur  in  an  SDI  enviorment.  Differencing  operations  are  required  which  can 
lead  to  large  errors  in  high  noise  conditions.  In  these  cases,  detection 
and  tracking  accuracy  will  be  degraded  by  the  effects  of  incorrect  assump¬ 
tions  in  the  problem  formulation. 
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The  approach  we  are  studying  avoids  these  problems  by  analyzing  the 
image  on  a  pixel-by-pixel  basis,  with  no  impicit  smoothness  assumptions. 
Smoothness  may  be  employed  explictly  as  required,  for  example  to  develop  a 
local  Markov  random  field  model. 


3.2  Batch  Processing  Formulation 


We  are  interested  in  recovering  the  intensity  field  of  an  object  moving 
in  a  highly  cluttered  environment.  The  object  is  modeled  using  a  Gibbs 
distribution  of  the  form 


Pf  =  1  exp[-Vc(f)/T] 


(3.1) 


where  z  is  a  normalizing  constant,  f  is  the  object  intensity  field,  T  is 
the  "temperature",  and  Vc(f)  is  an  appropriately  chosen  potential  function. 
We  have  experimented  with  several  different  potential  functions,  including 
the  Ising  model,  with  the  result  that  the  following  function 


vc(f)  =  l  |  fi  -  fj  |  . 

«i 


(3.2) 


where  Nj  is  the  neighborhood  of  the  i— —  pixel  and  f^  is  the  intensity  of 
the  i-^  pixel,  works  reasonably  well.  In  our  research  we  have  used  a  4  - 
connected  neighborhood  consisting  of  the  4  adjacent  pixels.  The  probabi¬ 
lity  density  P^  describes  the  apriori  information. 

In  order  to  process  cluttered  images  we  need  a  stochastic  image  model 
which  accounts  for  object  motion,  background  clutter,  and  image  noise.  We 
have  used  the  following  model  for  the  observed  image  g  at  time  step  k: 


J 


■ .  ,  ■  .  « .  ’j,  ->  ’»  *  •  v  v  /  ■*  v.  r.#-  •’»  *V*’-*V*V<« i  Av  v 


g*(k)  =  bA[  l-s(  f  ±k)  J  +  fj.k  +  ni(  k) 


(3.3) 


where  b^  is  the  (fixed)  background,  ik  indexes  the  pixels  according  to  the 
velocity  of  the  object,  and  s(.)  is  a  function  which  is  one  if  the  object 
covers  pixel  ik  at  time  step  k  and  is  zero  otherwise.  For  a  two- 
dimensional  field  indexed  by  p,q: 

pk  =  p  -  k  vx 
qk  =  q  -  k  vy 

where  vx,  Vy  are  orthogonal  velocity  components.  Note  that  we  make  no 
smoothness  assumptions  such  as  is  done  in  visual  flow  field  analysis.  Thi 
allows  us  to  analyze  images  with  only  a  relatively  few  number  of  pixels. 

The  noise  n-j(k)  consists  of  two  processes 

ni(k)  =  nol  +  nki  (3.4) 

where  nQ^  is  a  time- invar iant  noise  present  at  all  time  steps  and  nkj  is  a 
time-varying  noise.  Both  processes  are  assumed  to  be  exponentially  distri¬ 
buted  with  known  mean  and  variances  and  are  mutually  independent.  By 
assuming  zero-mean  Gaussian  distributions,  the  problem  becomes  one  of  mini¬ 
mizing  the  following  energy  function  with  respect  to  b  and  f: 

E(f,g)  =  Vc(f)  /  T 

4  I  8i00  -  b 

i,k  | 

where  N  is  the  sum  of  the  variances  of  the  two  noise  processes. 

The  minimization  was  carried  out  as  follows: 


i  [  l-s( f  ik)  ]  - 


(3.3) 
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(i)  An  estimate  of  was  calculated  as  :  b^  =  b  with  b  the  mean  image 
intensity  over  the  entire  image  field. 

(ii)  Select  a  pixel  i  at  random. 

(iii)  Select  a  new  estimate  fj^  *  f^  from  the  set  of  allowable  intensities 
using  a  uniformly  distributed  random  variable. 

(iv)  Compute  the  resulting  change  in  energy  AE. 

(v)  If  AE  <  0,  accept  the  change,  i.e.,  set  f^  =  fi# 

(iv)  If  AE  >  0,  accept  the  change  if  exp  (-AE)  >  r,  where  r  is  a  uni¬ 
formly  distributed  random  variable  between  0  and  I;  otherwise, 
leave  f^  unchanged. 

(iiv)  Repeat  steps  (ii)  -  (vi). 

Steps  (ii)  -  (vi)  comprise  the  Metropolis  algorithm  (Metropolis,  et  al 
1953),  which  generates  a  regular,  reversible  Markov  chain.  The  resulting 
image  configurations  are  distributed  according  to  the  Gibbs  distribution 

j  exp  [-  E(f ,  g) ] 

Minimization  with  respect  to  velocity  was  performed  by  an  exhaustive 
search  over  all  possible  velocities,  using  a  course  quantization.  Note 
that  this  minimization  can  be  done  using  completely  independent  processors. 


We  have  conducted  experiments  on  several  different  cases,  with  pro¬ 
mising  results.  An  example  is  shown  in  Figure  (3.1),  where  a  square  AxA 


object  is  moving  at  the  rate  of  one  pixel/sample  to  the  right  over  a  clut¬ 
tered  10xlA  background  and  is  viewed  in  noise.  The  object  has  intensity  of 
2  over  a  background  whose  mean  intensity  is  1  with  an  rms  value  of  0.5. 

The  additive  noise  has  an  rms  value  of  0.5  and  T  =  1.  Four  intensity 
levels  (0, 1,2,3)  are  plotted.  Figure  (3.1)  shows  the  actual  object,  actual 
background  (panel  b)  and  A  successive  cluttered  images  of  the  moving  object 
(panels  c-f).  The  object  has  essentially  disappeared  in  the  noise. 

Recovery  is  shown  in  panels  g-k:  the  object  is  fully  recovered  using  only 
16  iterations/pixel  (we  are  assumung  parallel  implementation).  The  error 

shown  is  £  (fj[  -  fj)2  while  the  energy  shown  is  E  for  a  temperature  of  1. 
i 

The  initial  energy  (using  as  b  the  mean  pixel  intensity)  was  601.017.  The 
minimum  energy  solution  clearly  recovers  the  original  object.  Note  that 
this  case  corresponds  to  recovering  the  object  in  the  time  required  for  it 
to  move  by  one  length. 

The  previous  tests  used  the  Metropolis  algorithm  to  detect  moving 
objects.  We  have  found  that  the  algorithms  may  be  simplified  considerably 
without  any  significant  loss  of  accuracy  by  simplifying  the  iterations. 

The  simplification  is  to  only  accept  changes  which  lead  to  a  reduction  in 
the  energy  fuction. 

An  example  of  our  results  is  given  in  Figures  3.2  -  3.6.  In  Figure 
3.2,  a  series  of  six  noise-free  images  are  shown  in  which  a  3x3  object  of 
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Figure  3.1 


Detection  of  Moving  Object  in  Cluttered  Background  and  Noise 
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(f)  noisy  image  (k=4) 
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Figure  3.1  (cont'd)  Detection  of  Moving  Object  in  Cluttered  Background  and  Noise 


************** 

************** 

************** 

************** 

************** 

************** 

************** 

************** 

************** 


(g)  estimated  background 


energy  607.3359 
error  49 

(h)  object  estimate 

(1  iteration/pixel) 


llll 

tt > 

mi 

llll 

+  *  ••  • 

ilia 

II  1 

nn 

II  1 

■  • 

iiaa 

energy  563.8228 
error  8 

(i)  object  estimate 

(6  iterations/pixel) 


energy  552.8715 
error  1 


(j)  object  estimate 

(11  iterat ions  'pixel) 


llll 

■  Ill 
llll 

■  III 


energy  538.2726 
error  0 

(k)  object  estimate 

(1*>  i t era t  ion s /p  1  xel  ) 


-J.  '  -  '  * 


phi :  k» 


******************* 

**m************** 

**m************** 

******************* 

******************* 

******************* 

******************* 

******************* 

******************* 

******************* 

******************* 

******************* 


phi:k=  4 


******************* 

******************* 

******************* 

******************* 

*****11************ 

*****m*********** 


******************* 

******************* 

******************* 

******************* 


phi:k=  2 


phi:k=  5 


******************* 

******************* 

******************* 

***111************* 

***m************* 

******************* 

******************* 

******************* 

******************* 

******************* 

******************* 

******************* 

******************* 


ohi:k*  3 


******************* 
*******  *****  ******* 
******************* 
******************* 
*******  *******  ***  *  * 
******m********** 
******m********** 
******m********** 


******************* 


phi:k=  6 


******************* 
******************* 
****!««************ 
****111 ************ 


******************* 

******************* 

******************* 

******************* 

******************* 

******************* 


Figure  3.2  Noise  -  Free  Images  of  Moving  Object  (v  »  (1,1)). 
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Figure  3.3 


True  Object  In  Original  Position  ( '<  =  1 )  and 
Cluttered  Background. 
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Figure  3.4  Noisy  Images  Containing  Targe 
Barkeround  with  Additive  Forei 
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Figure  3.5  Object  Detection  (at  original  position)  for 
Velocity  Match  (0  =  (1,1)). 
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Figure  3.6  Object  Detection  for  Velocity 
Mismatch  ( v  =  ( 1  ,0 ) ) . 
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intensity  2  moves  over  a  13x19  background  of  intensity  1  with  velocity  v  = 
(1,1)  pixel/sample.  The  object  in  its  original  position  over  a  background 
of  intensity  0  is  shown  in  Figure  3.3.  In  Figure  3.3,  a  cluttered 
background  is  shown  which  was  generated  using  Gaussian  random  noise  with 
rms  value  of  0.8.  The  intensities  are  quantized  in  the  figure  to  four 
values  (0, 1,2,3).  Noisy  images  were  then  generated  by  adding  time- 
invariant  noise  (rms  =  0.8)  and  time-varying  noise  (rms  =  0.4)  to  the  image 
comprised  of  the  object  over  the  cluttered  background.  The  resulting  ima¬ 
ges  are  shown  in  Figure  3.4;  the  object  has  essentially  disappeared  in  the 
noise.  These  six  images  were  used  in  the  detection  algorithm  to  search  for 
the  presence  of  a  moving  object.  The  results  are  shown  in  Figure  3.5  and 
3.6.  In  Figure  3.5,  the  estimated  velocity  matches  the  actual  velocity  and 
the  object  is  recovered,  with  only  small  error  and  is  in  the  correct  posi¬ 
tion.  Three  cases  are  shown  for  n=l,  6,  11  where  n  is  the  number  of  global 
iterations  (a  global  iteration  is  comprised  of  one  iteration  per  pixel  over 
the  entire  image). 

In  Figure  3.6,  the  results  for  the  case  of  a  velocity  mismatch 
(v  =  (1,0))  are  shown:  two  long  slender  objects  are  detected  which  are 
aligned  along  the  estimated  velocity  direction  and  are  in  the  wrong  posi¬ 
tion.  However,  the  energy  is  higher  than  for  the  case  of  a  velocity  match, 
indicating  that  there  is  a  velocity  error.  Thus,  the  minimum  energy  solu¬ 
tion  yields  the  correct  velocity,  as  desired.  The  error  values  shown  in 
the  figures  are  the  true  (unobservable)  fit  error  measured  in  terms  of  the 
pixel-by-pixel  difference  between  the  object  estimate  and  the  true  object. 
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3.3  Recursive  Problem  Formulation 


During  the  course  of  our  research  we  developed  a  new  problem  for¬ 
mulation  which  yields  improved  results  over  the  previous  batch  processing 
approach.  It  is  based  on  a  recursive  Bayes  formulation  to  compute  the 
object  conditional  density,  based  on  the  observed  images  to  date. 


In  our  work  we  have  used  the  following  model: 


*k(  i)  =  B  ( i )  [l-S(*(ik)]  +  'f'(ik)  +  t»k(i)  +  n0(i) 


(3.6) 


k  =  1,2,  .  .  . 

i  =  1,2,  .  .  . 


where 


SfcC i)  =  observation  intensity  at  pixel  i  and  time  k 

* ( i Vc)  =  object  intensity  at  pixel  ik  and  time  k.  The  index  ik  depends 
upon  the  motion  of  the  object.  Throughout  we  will  set  ij  =  i. 


B(i)  =  background  intensity  at  pixel  i 
n^Ci)  =  time-invariant  foreground  noise 
nk(i)  =  time-varying  foreground  noise 


S(f(ik) 


mk)  =  o 


(3.7) 


1  ;  otherwise 


Note  that  B(i)  is  t Ime- invar ian t .  Also,  without  loss  of  generality  we 


take  n^(i)  =  0.  We  will  assume  that  nk(i)  and  np)(i)  are  mutually  indepen¬ 


dent  gaussian  zero  mean  processes,  uncorrelated  over  both  space  and  time, 


E  [njc(i)2]  =  Nj  for  all  k 
E  ( hq ( i ) ^  ]  =  Ng 


Let  ¥  =  {¥(1)} 

*k  = 

Bk  =  {B(  1)  } 

Then  we  are  interested  in  the  probability 

PC'4',  »(<*>!,  *2,  •  •  *n) 


I 


By  Bayes'  Rule  we  get  the  recursion 


PC'1',  B  *!,  *n)  = 


PC ^n  I  *,  B,  .  ,  »n_t)  P(¥,  B  [  ^  .  V 

PC^n  ^  1  •  *n- 1 ) 


We  start  with  the  initial  conditions,  using  as  apriori  probability 


Po(¥,  B)  =  -1—  exp 
z0 


17  “if)  - 


where  Uj(.)  and  l^C*)  ate  appropriate  energy  functions.  For  example,  we 
can  utilize  Ising  potentials.  These  potentials  are  completely  arbitrary 
and  yield  a  Gibbs  distribution. 


Let 


PoC'1',  B)  =  i-  exp  [-  U0(¥,  B)] 
z0 

Then 
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P(*,  B  |  *j) 


P(*l  *,  B)  P0<¥,  B) 
P( * ! ) 


The  denominator  is  independent  of  f  and  B.  Thus,  if  we  want  a  point 
estimate  (Maximum  Aposteriori  e.g.)  we  need  only  evaluate  the  numerator. 
We  will  assume  we  are  looking  for  a  MAP  estimator  here.  Neglecting  terms 
independent  of  ¥  and  B  we  get  a  simplified  form. 


*!<i)  =  B(i)  [1-S(»(i)]  +  ¥(i)  +  n0(i) 

Recall  that  there  is  no  time-varying  noise  component  here. 


We  now  introduce  our  motion  model.  We  assume  that,  the  object  we  are 
interested  in  tracking  is  associated  with  ¥  in  the  following  way: 


(a)  if  ¥(i|c)  >  0,  then  the  object  is  within  pixel  i^  at  time  k 


(b)  if  ¥(![<)  =  0,  then  the  object  is  not  within  pixel  i^  at  time  k 


We  further  require  that  ¥(ik)  >  o  for  all  i,k. 


We  assume  here  constant  rectilinear  motion,  which  is  consistent  with 
the  space-based  target  tracking  problem  over  a  short  time  period.  Over  a 
longer  time  period,  the  tracking  algorithm  can  be  used  on  successive  data 
windows  to  achieve  accurate  tracking  for  non-rectilinear  motion.  Thus  we 
can  write,  in  two  dimensions 


't'(ik»  Jk)  =  vx,  j-k  vy) 


where  vx,  Vy  are  the  velocity  components.  This  is  the  discrete  form  of  the 
optical  flow  equation.  Returning  to  our  development  we  find  that  the  con- 
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ditional  density  for  the  first  image  is 


P(*l  |  B)  =  n  Pno  {*!(!)  -  B(i)  fl-s(»(i)]  -  f(i)} 

where  PnQ  is  the  probability  density  of  ng.  Combining  these  results,  we 
find  that  the  aposteriori  density  given  the  first  image  may  be  written  as  a 
Gibbs  distribution 


?(’*'•  B  |  *j)  -  ~  exp  {-  E !  ( 4*  0 ,  B,  *,)} 

where 

MV  B,  *!)  =  — 2^—  I  (*i(i)  -  B(i)  [l-S(Vil))l  " 

+  UgC'f,  B) 

We  now  go  to  the  next  time  step.  The  second  image  is  modeled  as 
®2(i)  =  B(i)  [1-S(4'0(i2)]  +  't'gC  i2)  +  n0(i)  +  n2(i) 

By  subtracting  $i(i)  we  get 

*2(i)  "  *i(D  =  B(i)  t-S('?0(i2>)  +  S(T0(ii))l 

+  *V*2)  “  V1!)  +  n2^i^ 


P(*.  B  |  *lt  $2)  = 


P(*2  |  *,  B,  *1)  P(Y,  B  |  *j) 

pTTTTTTI 


Now 


n  pn  t*2(i)  -  *i(i)  +  B(i)  (S(f(i2))  -  S(f(i))]  +  *(i)  -  f 

where  Pn  (.)  is  the  probability  density  of  n2. 


Thus 


P(f ,  B  |  *2)  =  —  exp  {-  E(Y0,  B,  *,)} 


where 


E(*0,  B,  *!,  *2)  =  U0  (V  B) 

2 

+  l  {$l(i)  "  Bi)  t 1-S( i) 1  -  »(i)> 

ZN 0  i 

+  — l  {*2(i)  -  *i(l)  +  B ( i )  [S(t(io))  -  S(*(i))1  +  *(1) 

2N  i  t 

We  can  see  immediately  by  induction  that 


P(f,  b)  *lf  *n)  =  i-  exp  {-  E(fn,  B,  *,  ...  ®n)} 

zn 

where 

E(»0,  B,  *lf  ...,  *n)  =  Un(V  B) 

+  l  {*l(i>  -  n-SC't'U))]  -  m)}2 

2N0  i 

+  l  1  (*k(i)  -  *i(D  +  B(i)  [  S(  f  ( ik) )  -  S(f(i))]  +  f(i) 

2N1  k  1 

This  expression  is  to  be  minimized  with  respect  to  ,  B,  v. 


Note  that  E  is  nonlinear  in  Y . 


Minimization  can  be  done  for  example  using  gradient  approximations  or 


stochastic  approximations  or  simulated  annealing.  In  our  work  to  date  we 
have  used  the  randomized  descent  method  described  in  Section  3.3,  which  is 
highly  paral lei izahl e. 

If  we  have  no  prior  information  on  B,  then  Up  ( Tp ,  B)  =  Up  (Tp). 
However,  if  we  assume,  say,  that  object  is  generally  brighter  that 
background  we  can  construct  an  appropriate  potential  function.  Here,  we 
have  assumed  no  prior  information  on  B. 

Experimental  Results 

We  present  results  on  the  same  case  as  presented  in  Figures  3.2  -  3.6, 
in  which  there  was  some  error  in  recovering  the  object.  Figure  3.7  shows 
the  object  at  its  original  position  (k  =  1)  which  is  moving  with  velocity 
v  =  (1,1)  pixel/sample.  Also  shown  is  the  background  which  had  mean  one 
and  rms  0.8  (gaussian).  Figure  3.8  shows  successive  cluttered  images  In 
which  the  object  is  moving  over  the  cluttered  background  in  additive  noise 
( Np  =  0.64,  Nj  =  0.16).  Figure  3.9  shows  the  object  recovery  for  three 
different  values  of  estimated  velocity  after  n  global  iteratios.  When  the 
velocity  matches  (panel  (a)),  the  global  minimum  energy  is  achieved  and  the 
object  is  recovered  exactly.  This  is  an  Improvement  in  performance  over 
the  results  presented  in  Figure  3.6.  Note  that  no  more  than  1!  global 
Iterations  are  required  for  recovery.  The  indicated  error  is  the  true  sum- 
squared  pixel  error. 


Since  recovery  was  perfect  for  this  case,  we  increased  the  noise  tc 


1.0,  Np  *  1,  Ni  »  0.25,  a  decrease  of  36%  in  signal/noise  ratio  for 
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(b)  velocity  mismatch  (v  =  (0,1)) 
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(c)  velocity  mismatch  (v  =  (1,0)) 
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each  noise  source.  The  cluttered  images  are  shown  in  Figure  3. in  and  object 
recovery  is  shown  in  Figure  3.11.  One  object  pixel  is  in  error  and  there 
are  three  pixel  artifacts. 

3. 4  Parallel  Implementations 

The  randomized  (Monte  Carlo)  procedure  we  have  used  for  finding  MAP 
solutions  to  the  Bayesian  estimation  problem  can  be  accelerated  signifi¬ 
cantly  by  using  a  parallel  architecture  for  implementation.  If  a  processor 
is  assigned  to  each  pixel,  for  example,  then  the  processing  time  will  be 
reduced  by  a  factor  of  n/k,  where  n  is  the  total  number  of  pixels  and  k  is 
the  chromatic  number  of  the  lattice.  The  chromatic  number  is  equal  to  the 
minimum  number  of  colors  needed  to  color  the  sites  of  the  lattice  in  such  a 
way  that  no  two  neighbors  have  the  same  color.  As  an  example,  in  our 
4-connected  neighborhood  used  for  experimentation  the  chromatic  number  is 
2. 

To  be  more  specific,  consider  solving  for  the  MAP  estimate  by  mini¬ 
mizing  (3.8)  using  a  massively  parallel  architecture  such  as  the  Connection 
Machine  (Hillis,  1985),  which  is  a  Single  Instruction  Multiple  Data  (SIMD) 
array  processor  consisting  of  256,000  processing  units.  Each  unit  has  a 
single-bit  ar i thme t i c/ 1 ogi ca 1  unit  and  about  4k  bits  of  storage  and  is 
organized  in  a  4-connected  lattice  that  is  512  elements  square.  At  each 
cycle  of  the  machine,  which  we  assume  here  to  have  a  duration  of  one  micro¬ 
second,  an  instruction  Is  executed  by  each  processor  and  a  single  bit  is 
transmitted  to  its  neighbors.  This  means  that  the  first-order  Markov  field 
we  have  assumed  here  can  be  efficientlv  implemented. 
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Figure  3.11  Object  Recovery  with  Increased  Noise. 
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form  a  single  16-bit  addition,  subtraction  or  comparison;  256  cycles  are 
required  to  perform  multiplication  or  division;  512  cycles  are  required  to 
generate  a  uniformly-distributed  random  variable;  and  16  cycles  are  needed 
for  memory  transfer.  We  also  assume  a  chromatic  number  of  two,  an  overhead 
factor  of  two,  20  global  iterations , and  six  sequential  images  to  be  ana¬ 
lyzed.  The  resulting  cycle  count  for  a  solution  is  1,699,840  which  yields 
a  solution  time  of  about  1.7  seconds.  Note  that  for  n  pixels  the  number  of 
processors  required  is  2  n  v  where  v  is  the  number  of  possible  velocity 
combinations  to  be  checked. 
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4.  SUMMARY  AND  CONCLUSIONS 


This  research  has  been  focused  on  the  moving  target  detection  and 
tracking  problem  using  image  analysis  techniques.  We  were  particularly 
interested  in  a  severely  cluttered  environment  in  which  high  degrees  of 
background  and  foreground  noise  were  present.  A  probabilistic  approach  has 
been  developed  using  an  integrated  Bayesian  approach.  Prior  knowledge  is 
employed  using  a  Markov  random  field  model  in  the  form  of  a  Gibbs  distribu¬ 
tion.  Using  probabilistic  noise  models,  the  aposteriori  distribution  is 
also  Gibbs.  No  knowledge  of  the  target  is  presumed,  other  than  a  local 
statistical  model  involving  an  arbitrarily-constructed  potential  function 
which  then  specifies  the  prior  distribution. 

The  framework  we  use  has  been  developed  recently  by  several  researchers 
and  has  yielded  promising  results  for  image  segmentation,  boundary  detec¬ 
tion  and  clutter  rejection.  In  this  research  we  have,  so  far  as  we  know, 
extended  the  framework  to  include  motion  analysis  for  the  first  time.  We 
develop  both  a  batch  processor  and  a  recursive  processor;  the  recursive 
processor  yields  superior  results  since  the  posterior  conditional  distribu¬ 
tion  is  theoretically  known  exactly.  The  batch  processor,  on  the  other 
hand,  is  a  smoother,  and  yields  ordinary  least-squares  solutions. 

We  have  developed  parallelizable  algorithms  for  computing  the  maximum 
aposteriori  estimate  of  the  image  field  containing  the  target.  These 
algorithms  are  of  the  Monte  Carlo  type  and  are  closely  related  to  the 
Metropolis  algorithm.  Numerical  results  are  presented  to  demonstrate  the 
potential  of  this  approach  in  solving  very  difficult  target  tracking 
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problems  in  an  extremely  noisy  environment.  We  speculate  that  the  results 
are  close  to  the  optimum  in  terms  of  accuracy  in  recovery  of  the  unknown 
moving  target. 

Parallel  implementation  is  discussed  and  an  example  given  to  demonstrate 
feasible  solution  times  on  current  array  processing  hardware. 

One  of  the  most  critical  aspects  of  the  problem  is  how  to  choose  the 
system  parameters  to  optimize  performance.  This  is  of  particular  impor¬ 
tance  when  autonomous  operation  is  required,  as  in  SDI.  Nonlinear  opti¬ 
mization  is  required,  once  the  interaction  effects  between  algorithm 
parameters  and  measurable  image  parameters  (such  as  statistical  variation, 
correlation,  etc)  is  worked  out.  This  problem  is  virtually  untouched  in 
the  literature  and  would  be  a  suitable  topic  for  Phase  II  research. 
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